Abstract: Document clustering is a powerful technique to detect topics and their relations for information browsing, analysis, and organization. However, clustered documents require post-assignment of descriptive titles to help users interpret the results. Existing techniques often assign labels to clusters based only on the terms that the clustered documents contain, which may not be sufficient for some applications more over term labeling will not give clear meaning of the clustered contents. To solve the problem, a phrase based cluster labeling is considered in this work. The work considers embedding external knowledge to terms using WordNet and provides an approach to derive a theme in the group of documents and label that group with the most appropriate Phrase. Number of experiments conducted on benchmark datasets and observed that results produced are very accurate to the clusters formed.
Keywords: Thematic Phrases, Document clustering, Information Browsing, Analysis, and Organization.